Optimizing Checkpointing Performance in Spark
نویسندگان
چکیده
منابع مشابه
Optimizing Shuffle Performance in Spark
Spark [6] is a cluster framework that performs in-memory computing, with the goal of outperforming disk-based engines like Hadoop [2]. As with other distributed data processing platforms, it is common to collect data in a manyto-many fashion, a stage traditionally known as the shuffle phase. In Spark, many sources of inefficiency exist in the shuffle phase that, once addressed, potentially prom...
متن کاملOptimizing VM Checkpointing for Restore Performance in VMware ESXi
Irene Zhang started her presentation by explaining that checkpointing is similar to “suspend,” but while taking a checkpoint, the virtual machine can continue its execution. Because checkpointing is used for fault tolerance, taking a checkpoint can be done quickly, in less than a few seconds; restoring from the checkpoint, although slow, hasn’t been a problem; however, recent applications, such...
متن کاملMemory Exclusion: Optimizing the Performance of Checkpointing Systems
Checkpointing systems are a convenient way for users to make their programs fault-tolerant by intermittently saving program state to disk, and restoring that state following a failure. The main concern with checkpointing is the overhead that it adds to running time of the program. This paper describes memory exclusion, an important class of optimizations that reduce the overhead of checkpointin...
متن کاملThe Performance of Consistent Checkpointing
Consistent checkpointing provides transparent fault tol erance for long running distributed applications In this paper we describe performance measurements of an im plementation of consistent checkpointing Our measure ments show that consistent checkpointing performs re markably well We executed eight compute intensive dis tributed applications on a network of diskless Sun workstations comparin...
متن کاملCheckpointing Orchestration for Performance Improvement
Checkpointing is a mostly used mechanism for supporting fault tolerance of high performance computing (HPC), but notorious in its expensive disk access. Parallel file systems such as Lustre, GPFS, PVFS are widely deployed on super computers to provide fast I/O bandwidth for general data-intensive applications. However, the unique feature of checkpointing makes it impossible to benefit from the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: DEStech Transactions on Computer Science and Engineering
سال: 2017
ISSN: 2475-8841
DOI: 10.12783/dtcse/csma2017/17315